MATH50003 Combined Problem Sheet Solutions

Original Author: Dr. Sheehan Olver Compiled by: Isaac Lee | Date: 16/03/2022 | Github Logo Icon Github Source

MATH50003 Problem Sheet 1

This problem sheet tests the representation of numbers on the computer, using modular arithmetic. We also use floating point rounding modes to implement interval arithmetic, and thereby produce rigorous bounds on the exponential.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

1. Binary representation

Problem 1.1 (C) What is the binary representation of 1/5? (Hint: use printbits to derive a proposed form.)

SOLUTION Note that

Hence we show that

(0.00110011001100)2=(23+24)(1.00010001000)2=(23+24)k=0116k=23+241124=315=15

 

 

Problem 1.2 (C) What is π to 5 binary places? Hint: recall that π3.14.

SOLUTION Note that

which has the binary representation (11.001001)2. Indeed:

Instead of simply guessing the above representation we can instead continuously subtract the largest powers 2 which do not result in a negative number. For π the procedure then finds that we can write π121120123126...

2. Integers

Problem 2.1 (C) With 8-bit signed integers, find the bits for the following: 10,120,10.

SOLUTION We can find the binary digits by repeatedly subtracting the largest power of 2 less than a number until we reach 0, e.g. 10232=0 implies 10=(1010)2. Thus the bits are:

Similarly,

120=26+25+24+23=(1111000)2

Thus the bits are:

For negative numbers we perform the same trick but adding 2p to make it positive, e.g.,

10=2810(mod28)=246=27+26+25+24+22+2=(11110110)2

This the bits are:

 

Problem 2.2 (C) What will Int8(120) + Int8(10) return?

SOLUTION It will return

130(mod 28)=126(mod 28)

3. Floating point numbers

Problem 3.1 (C) What are the single precision F32 (Float32) floating point representations for the following:

2,31,32,23/4,(23/4)×2100

Check your answers using printbits.

SOLUTION Recall that we have σ,Q,S = 127,8,23. Thus we write

2=2128127(1.00000000000000000000000)2

The exponent bits are those of

128=27=(10000000)2

Hence we get

We write

31=(11111)2=2131127(1.1111)2

And note that 131=(10000011)2 Hence we have:

On the other hand,

32=(100000)2=2132127

and 132=(10000100)2 hence:

Note that

23/4=22(10111)2=2129127(1.0111)2

and 129=(10000001)2 hence we get:

Finally,

23/42100=2229127(1.0111)2

and 229=(11100101)2 giving us:

Problem 3.2 (B) Let m(y)=min{xF32:x>y} be the smallest single precision number greater than y. What is m(2)2 and m(1024)1024? Check your answer using the nextfloat command.

SOLUTION The next float after 2 is 2(1+223) hence we get m(2)2=222:

similarly, for 1024=210 we find that the difference m(1024)1024 is 21023=213:

 

4. Arithmetic

Problem 4.1 (C) Suppose x=1.25 and consider 16-bit floating point arithmetic (Float16). What is the error in approximating x by the nearest float point number fl(x)? What is the error in approximating 2x, x/2, x+2 and x2 by 2x, x2, x2 and x2?

SOLUTION None of these computations have errors since they are all exactly representable as floating point numbers.

Problem 4.2 (B) For what floating point numbers is x2x/2 and x2x+2?

SOLUTION

Consider a normal x=2qσ(1.b1bS)2. Provided q>1 we have

x2=x/2=2qσ1(1.b1bS)2

However, if q=1 we lose a bit as we shift:

x2=21σ(0.b1bS1)2

and the property will be satisfy if bS=1. Similarily if we are sub-normal, x=21σ(0.b1bS)2 and we have

x2=21σ(0.0b1bS1)2

and the property will be satisfy if bS=1. (Or NaN.)

Here are two examples:

For the second part, Similar to the next problem, we see that the property holds true if |x|<2S+21, as otherwise:

We see this is sharp:

 

Problem 4.3 (A) Explain why the following return true. What is the largest floating point number y such that y + 1 ≠ y?

SOLUTION

Writing 10=23(1.01)2 we have

fl(10100)=fl(2300(1+24)100)=2300(1.b1b52)2

where the bits bk are not relevant. We then have:

fl(10100)1=fl(2300[(1.b1b52)2+2300])=fl(10100)

since 2300 is below the necessary precision.

The largest floating point number satisfying the condition is y=2531, since S=52. First note 253 does not satisfy the condition:

We can however successfully create the previsous float 2531 by subtracting (Explain why this works while x+1 fails):

And this satisfies:

Problem 4.4 (B) What are the exact bits for 1/5, 1/5+1 computed using half-precision arithmetic (Float16) (using default rounding)?

SOLUTION

We saw above that

1/5=23(1.10011001100)223(1.1001100110)2

where the is rounded to the nearest 10 bits (in this case rounded down). We write 3=1215 hence we have q=12=(01100)2. so we get the bits:

Adding 1 we get:

1+23(1.1001100110)2=(1.001100110011)2(1.0011001101)2

Here we write the exponent as 0=1515 where q=15=(01111)2. Thus we get:

Problem 4.5 (A) Explain why the following does not return 1. Can you compute the bits explicitly?

 

SOLUTION For the last problem, note that

110=1215=24(1.10011001100)2

hence we have

fl(110)=24(1.1001100110)2

and

fl(1+110)=fl(1.0001100110011)=(1.0001100110)2

Thus

fl(1.1)1=(0.0001100110)2=24(1.1001100000)2

and hence we get

fl(0.1)(fl(1.1)1)=fl((1.1001100110)2(1.1001100000)2)1

To compute the bits explicitly, write y=(1.10011)2 and divide through to get:

(1.1001100110)2(1.10011)2=1+28y+29y

We then have

y1=3251=0.627=(0.101)2

Hence

1+28y+29y=1+(29+211+)+(210+)=(1.00000000111)2

Therefore we round up (the is not exactly zero but if it was it would be a tie and we would round up anyways to get a zero last bit) and get:

Problem 4.6 (B) Find a bound on the absolute error in terms of a constant times ϵm for the following computations

(1.11.2)+1.3(1.11)/0.1

implemented using floating point arithmetic (with any precision).

SOLUTION

The first problem is very similar to what we saw in lecture. Write

(fl(1.1)fl(1.2))fl(1.3)=(1.1(1+δ1)1.2(1+δ2)(1+δ3)+1.3(1+δ4))(1+δ5)

We first write

1.1(1+δ1)1.2(1+δ2)(1+δ3)=1.32(1+δ6)

where

|δ6||δ1|+|δ2|+|δ3|+|δ1||δ2|+|δ1||δ3|+|δ2||δ3|+|δ1||δ2||δ3|4ϵm

Then we have

1.32(1+δ6)+1.3(1+δ4)=2.62+1.32δ6+1.3δ4δ7

where

|δ7|7ϵm

Finally,

(2.62+δ6)(1+δ5)=2.62+δ6+2.62δ5+δ6δ5δ8

where

|δ8|10ϵm

 

For the second part, we do:

(fl(1.1)1)fl(0.1)=(1.1(1+δ1)1)(1+δ2)0.1(1+δ3)(1+δ4)

Write

11+δ3=1+δ5

where

|δ5||δ31+δ3|ϵm2111/2ϵm

using the fact that |δ3|<1/2. Further write

(1+δ5)(1+δ4)=1+δ6

where

|δ6||δ5|+|δ4|+|δ5||δ4|2ϵm

We also write

(1.1(1+δ1)1)(1+δ2)0.1=1+11δ1+δ2+11δ1δ2δ7

where

|δ7|17ϵm

Then we get

(fl(1.1)1)fl(0.1)=(1+δ7)(1+δ6)=1+δ7+δ6+δ6δ7

and the error is bounded by:

(17+2+34)ϵm=53ϵm

This is quite pessimistic but still captures that we are on the order of ϵm.

 

 

 

5. Interval arithmetic

The following problems consider implementation of interval arithmetic for proving precise bounds on arithmetic operations. That is recall the set operations

A+B={x+y:xA,yB},AB={xy:xA,yB}.

Problem 5.1 (B) For intervals A=[a,b] and B=[c,d] such that 0A,B and integer n0, deduce formulas for the minimum and maximum of A/n, A+B and AB.

Solution

An={[a/n,b/n]n>0[b/n,a/n]n<0,A+B=[a+c,b+d]AB={[bd,ac]a,b,c,d<0[ad,bc]a,b<0 and c,d>0[bc,ad]a,b>0 and c,d<0[ac,bd]a,b,c,d>0

Problem 5.2 (B) We want to implement floating point variants such that, for S=[a,b]+[c,d] P=[a,b][c,d], and D=[a,b]/n for an integer n,

[a,b][c,d]:=[fldown(minS),flup(maxS)][a,b][c,d]:=[fldown(minP),flup(maxP)][a,b]n:=[fldown(minD),flup(maxD)]

This guarantees S[a,b][c,d], P[a,b][c,d], and D[a,b]n. In other words, if x[a,b] and y[c,d] then x+y[a,b][c,d], and we thereby have bounds on x+y.

Use the formulae from Problem 5.1 to complete (by replacing the # TODO: … comments with code) the following implementation of an Interval so that +, -, and / implement , , and as defined above.

Problem 5.3 (A) The following function computes the first n+1 terms of the Taylor series of exp(x):

k=0nxkk!

Bound the tail of the Taylor series for ex assuming |x|1. (Hint: ex3 for x1.) Use the bound to complete the function exp_bound which computes ex with rigorous error bounds, that is so that when applied to an interval [a,b] it returns an interval that is guaranteed to contain the interval [ea,eb].

Check your result for computing and e1 by assuring that the following returns true:

Further, ensure that the width of each returned interval is less than 1014.

SOLUTION From the Taylor remainder theorem we know the error is

f(k+1)(ξ)(k+1)!|x|k+13(k+1)!

Thus by widening the computation by this error we ensure that we have captured the error by truncating the Taylor series.

MATH50003 Numerical Analysis: Problem Sheet 2

This week we look at other variants of finite-differences, including central differences and second-order finite-differences. We also investigate mathematical properties of dual numbers and extend their implementation to other functions. Finally, we see how dual numbers can be combined with Newton iteration for root finding.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

1. Finite-differences

Problem 1.1⋆ (B) Use Taylor's theorem to derive an error bound for central differences

f(x)f(x+h)f(xh)2h.

Find an error bound when implemented in floating point arithmetic, assuming that fFP(x)=f(x)+δx where |δx|cϵm.

SOLUTION

By Taylor's theorem, the approximation around x+h is f(x+h)=f(x)+f(x)h+f(x)2h2+f(z1)6h3, for some z1(x,x+h) and similarly f(xh)=f(x)+f(x)(h)+f(x)2h2f(z2)6h3, for some z2(xh,x).

Subtracting the second expression from the first we obtain f(x+h)f(xh)=f(x)(2h)+f(z1)+f(z2)6h3. Hence,

f(x+h)f(xh)2h=f(x)+f(z1)+f(z2)12h2δTaylor.

Thus, the error can be bounded by |δTaylor|M6h2, where

M=maxy[xh,x+h]|f(y)|.

In floating point we have

(fFP(x+2h)fFP(x2h))(2h)=f(x+h)+δx+hf(xh)δxh2h(1+δ1)=f(x+h)f(xh)2h(1+δ1)+δx+hδxh2h(1+δ1)

Applying Taylor's theorem we get

(fFP(x+h)fFP(xh))(2h)=f(x)+f(x)δ1+δTaylor(1+δ1)+δx+hδxh2h(1+δ1)δx,hCD

where

|δx,hCD||f(x)|2ϵm+M3h2+2cϵmh

Problem 1.2 (B) Implement central differences for f(x)=1+x+x2 and g(x)=1+x/3+x2. Plot the errors for h = 2.0 .^ (0:-1:-60) and h = 10.0 .^ (0:-1:-16). Derive the error exactly for the different cases to explain the observed behaviour.

SOLUTION

To compute the errors of the central difference approximation of f(x) we compute

|f(x+h)f(xh)2hf(x)|= =|1+(x+h)+(x+h)21(xh)(xh)22h(1+2x)|= =|2h+4hx2h12x|=0.

As we can see, in this case the central difference approximation is exact. The errors we start observing for small step sizes are thus numerical in nature. The values of the function at f(x+h) and f(xh) eventually become numerically indistinguishable and thus this finite difference approximation to the derivative incorrectly results in 0.

To compute the errors of the central difference approximation of g(x) we compute

|g(x+h)g(xh)2hg(x)|= =|1+(x+h)3+(x+h)21(xh)3(xh)22h(13+2x)|= =|13+2x132x|=0.

 

Problem 1.3 (A)⋆ Use Taylor's theorem to derive an error bound on the second-order derivative approximation

f(x)f(x+h)2f(x)+f(xh)h2

Find an error bound when implemented in floating point arithmetic, assuming that

fFP(x)=f(x)+δx

where |δx|cϵm.

SOLUTION

Using the same two formulas as in 1.1 we have f(x+h)=f(x)+f(x)h+f(x)2h2+f(z1)6h3, for some z1(x,x+h) and f(xh)=f(x)+f(x)(h)+f(x)2h2f(z2)6h3, for some z2(xh,x).

Summing the two we obtain f(x+h)+f(xh)=2f(x)+f(x)h2+f(z1)6h3f(z2)6h3.

Thus, f(x)=f(x+h)2f(x)+f(xh)h2+f(z2)f(z1)6h.

Hence, the error is |f(x)f(x+h)2f(x)+f(xh)h2|=|f(z2)f(z1)6h|2Ch, where again C=maxy[xh,x+h]|f(y)6|.

In floating point arithmetic, the error is |fFP(x)f(x)|=|fFP(x+h)2fFP(x)+fFP(xh)h2f(x+h)2f(x)+f(xh)h2f(z2)f(z1)6h| |(fFP(x+h)f(x+h))2(fFP(x)f(x))+(fFP(xh)f(xh))h2|+|f(z2)f(z1)6h| |δx+h2δx+δxhh2|+2Ch4cϵmh2+2Ch.

Problem 1.4 (B) Use finite-differences, central differences, and second-order finite-differences to approximate to 5-digits the first and second derivatives to the following functions at the point x=0.1:

exp(expxcosx+sinx),k=11000(xk1), and f1000s(x)

where fns(x) corresponds to n-terms of the following continued fraction:

1+x12+x12+x12+,

e.g.: f1s(x)=1+x12 f2s(x)=1+x12+x12 f3s(x)=1+x12+x12+x12

SOLUTION

2. Dual numbers

Problem 2.1⋆ (C) Show that dual numbers D are a commutative ring, that is, for all a,b,cD the following are satisfied:

  1. additive associativity: (a+b)+c=a+(b+c)
  2. additive commutativity: a+b=b+a
  3. additive identity: There exists 0D such that a+0=a.
  4. additive inverse: There exists a such that (a)+a=0.
  5. multiplicative associativity: (ab)c=a(bc)
  6. multiplictive commutativity: ab=ba
  7. multiplictive identity: There exists 1D such that 1a=a.
  8. distributive: a(b+c)=ab+ac

SOLUTION In what follows we write a=ar+adϵ and likewise for b,cD.

Additive associativity and commutativity and existence of additive inverse are both immediate results of dual number addition reducing to element-wise real number addition. Furthermore, by definition of addition on D the dual number 0+0ϵ acts as the additive identity since

(ar+adϵ)+(0+0ϵ)=(ar+adϵ).

We explicitly prove multiplicative commutativity

ab=(ar+adϵ)(br+bdϵ)=arbr+(arbd+adbr)ϵ=brar+(brad+bdar)ϵ=ba.

We also explicitly prove multiplicative associativity:

(ab)c=((arbr+(arbd+adbr)ϵ)c=arbrcr+((arbd+adbr)cr+arbrcd)ϵ=arbrcr+(arbdcr+adbrcr+arbrcd)ϵ

and

a(bc)=a((brcr+(brcd+bdcr)ϵ)=arbrcr+(arbdcr+adbrcr+arbrcd)ϵ.

The number 1+0ϵ serves as the multiplicative identity. Note that for any dual number a, we have

(1+0ϵ)(ar+adϵ)=1ar+(ar0+1ad)ϵ=ar+adϵ=a.

Finally we show distributivity of multiplication:

a(b+c)=a(br+cr+(bd+cd)ϵ)=(arbr+arcr)+(arbd+arcd+adbr+adcr)ϵ,
ab+ac=arbr+(adbr+arbd)ϵ+arcr+(adcr+arcd)ϵ=(arbr+arcr)+(arbd+arcd+adbr+adcr)ϵ.

Problem 2.2⋆ (C) A field is a commutative ring such that 01 and all nonzero elements have a multiplicative inverse, i.e., there exists a1 such that aa1=1. Why isn't D a field?

SOLUTION

Fields require that all nonzero elements have a unique multiplicative inverse. However, this is not the case for dual numbers. To give an explicit counter example, we show that there is no dual number z which is the inverse of 0+ϵ, i.e. a dual number z such that

(0+ϵ)(zr+zdϵ)=1+0ϵ.

By appropriate multiplication with the conjugate we show that

(0+ϵ)(zrzdϵ)(zr+zdϵ)(zrzdϵ)=zrϵzr2=ϵzr.

This proves that no choice of real part zr can reach the multiplicative identity 1+0ϵ when starting from the number 0+ϵ. More general results for zero real part dual numbers can also be proved.

Problem 2.3⋆ (B) A matrix representation of a ring are maps from a group/ring to matrices such that matrix addition and multiplication behave exactly like addition and multiplication of the ring. That is, if A and B are elements of the ring and ρ is a representation, then

ρ(A+B)=ρ(A)+ρ(B) and ρ(AB)=ρ(A)ρ(B).

Show that the following are matrix representations of complex numbers and dual numbers (respectively): ρ:a+bi(abba),

ρϵ:a+bϵ(ab0a).

SOLUTION

Let A=a+bi and B=c+di and ρ be the map to their proposed matrix representations. First we check that addition complies with addition of complex numbers, that is (a+bi)+(c+di)=(a+c)+(b+d)i:

ρ(A)+ρ(B)=(abba)+(cddc)=(a+cb+d(b+d)a+c)=ρ(A+B).

Next for multiplication we need (a+bi)(c+di)=(acbd)+(bc+ad)i:

ρ(A)ρ(B)=(abba)(cddc)=(acbdbc+ad(bc+ad)acbd)=ρ(AB)

Now we move on to dual numbers A=a+bϵ and B=c+dϵ and their matrix representation map ρϵ: For addition of dual numbers (a+bϵ)+(c+dϵ)=(a+c)+(b+d)ϵ:

ρϵ(A)+ρϵ(B)=(ab0a)+(cd0c)=(a+cb+d0a+c)=ρϵ(A+B).

And finally for multiplication of dual numbers (a+bϵ)(c+dϵ)=ac+(ad+bc)ϵ:

ρϵ(A)ρϵ(B)=(ab0a)(cd0c)=(acad+bc0ac)=ρϵ(AB).

Problem 2.4⋆ (C) What is the correct definition of division on dual numbers, i.e.,

(a+bϵ)/(c+dϵ)=s+tϵ

for what choice of s and t? Use dual numbers to compute the derivative of the following functions at x=0.1:

exp(expxcosx+sinx),k=13(xk1), and f2s(x)=1+x12+x12

SOLUTION

As with complex numbers, division is easiest to understand by first multiplying with the conjugate, that is: a+bϵc+dϵ=(a+bϵ)(cdϵ)(c+dϵ)(cdϵ). Expanding the products and dropping terms with ϵ2 then leaves us with the definition of division for dual numbers (where the denominator must have non-zero real part): ac+bcadc2ϵ. Thus we have s=ac and t=bcadc2.

We saw in the lectures that we have

exp(a+bϵ)=exp(a)+bexp(a)ϵ,
sin(a+bϵ)=sin(a)+bcos(a)ϵ,

and

cos(a+bϵ)=cos(a)bsin(a)ϵ.

Using these, we find that evaluating exp(expxcosx+sinx) at a+bϵ yields

exp((exp(a)+bexp(a)ϵ)(cos(a)bsin(a)ϵ)+sin(a)+bcos(a)ϵ)
=exp(exp(a)cos(a)+(cos(a)exp(a)sin(a)exp(a))bϵ+sin(a)+bcos(a)ϵ)=exp((exp(a)cos(a)+sin(a))+b(cos(a)exp(a)sin(a)exp(a)+cos(a))ϵ)

which means exp(expxcosx+sinx) evaluated at a+bϵ is

exp((exp(a)cos(a)+sin(a)))+b(cos(a)exp(a)sin(a)exp(a)+cos(a))exp((exp(a)cos(a)+sin(a)))ϵ.

Setting b=1 the derivative at point a is thus given by the dual part, i.e.:

(cos(a)exp(a)sin(a)exp(a)+cos(a))exp((exp(a)cos(a)+sin(a))).

Problem 2.5 (C) Add support for cos, sin, and / to the type Dual:

Problem 2.6 (B) Use dual numbers to compute the derivatives to

exp(expxcosx+sinx),k=11000(xk1), and f1000s(x).

Does your answer match (to 5 digits) Problem 1.4?

SOLUTION

With the previous problems solved, this is as simple as running

3. Newton iteration

Newton iteration is an algorithm for computed roots of a function f using its derivative: given an initial guess x0, one obtains a sequence of guesses via

xk+1=xkf(xk)f(xk)

Problem 3.1 (B) Use Dual to implement the following function which returns xn:

SOLUTION

Here is a simple test case for the above function:

END

Problem 3.2 (B) Compute points y such that |f(y)|1013 (i.e., approximate roots):

exp(expxcosx+sinx)6 and k=11000(xk1)12

(Hint: you may need to try different x0 and n to get convergence. Plotting the function should give an indication of a good initial guess.)

SOLUTION

We can plot the functions on a few ranges to get an intuition

And then use our Newton iteration to compute approximate roots

Problem 3.3 (A) Compute points y such that |f1000s(y)j|1013 for j=1,2,3. Make a conjecture of what fns(x) converges to as n. (Bonus problem: Prove your conjecture.)

SOLUTION

By plotting the function we can conjecture that the continued fraction converges to x:

There are a handful of ways to prove this conjecture. Here is one - start with

x(1+x)=x+x,

then extend the RHS by 0=+11 to also obtain the factor 1+x there, resulting in

x(1+x)=(1+x)+x1.

Dividing through (1+x) now yields

x=1+x11+x.

Note that we can now plug this equation into itself on the right hand side to obtain a recursive continued fraction for the square root function.

MATH50003 Numerical Analysis: Problem 3

This problem sheet explores implementation of triangular solves, supporting a matrix with two super-diagonals, as well as permutation and Householder reflections that can be applied to a vector in O(n) complexity.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

1. Dense Matrices

Problem 1.1 (C) Show that A*x is not implemented as mul(A, x) from the lecture notes by finding a Float64 example where the bits do not match.

SOLUTION

First we have to define mul(A, x) as in the lecture notes:

Then we can easily find examples, in fact we can write a function that searches for examples:

2. Triangular Matrices

Problem 2.1 (B) Complete the following functions for solving linear systems with triangular systems by implementing back and forward-substitution:

SOLUTION

Here is an example:

 

Problem 2.2⋆ (B) Given 𝐱Rn, find a lower triangular matrix of the form

L=I2𝐯𝐞1

such that:

L𝐱=x1𝐞1.

What does L𝐲 equal if 𝐲n satisfies y1=𝐞1𝐲=0?

SOLUTION

By straightforward computation we find

Lx=x2𝐯𝐞1x=x2𝐯x1

and thus we find such a lower triangular L by choosing v1=0 and vk=xk2x1 for k=2..n and x10.

3. Banded matrices

Problem 3.1 (C) Complete the implementation of UpperTridiagonal which represents a banded matrix with bandwidths (l,u)=(0,2):

SOLUTION

We can check that the above methods to read and write entries work:

Problem 3.2 (B) Complete the following implementations of * and \ for UpperTridiagonal so that they take only O(n) operations.

SOLUTION

And here is an example of what we have implemented in action:

And just for fun, let's do a larger scale dense speed comparison

4. Permutations

Problem 4.1⋆ (C) What are the permutation matrices corresponding to the following permutations?

(123321),(123456214365).

SOLUTION

Let σ=(123321) and τ=(123456214365). There are two ways to construct Pσ and Pτ.

Either way, we have Pσ=(001010100)andPτ=(010000100000000100001000000001000010)

Problem 4.2 (B) Complete the implementation of a type representing permutation matrices that supports P[k,j] and such that * takes O(n) operations.

The following codes show that * takes O(n) of both time and space.

5. Orthogonal matrices

Problem 5.1⋆ (C) Show that orthogonal matrices preserve the 2-norm of vectors:

Q𝐯=𝐯.

SOLUTION

Q𝐯2=(Q𝐯)Q𝐯=𝐯QQ𝐯=𝐯𝐯=𝐯2

Problem 5.2⋆ (B) Show that the eigenvalues λ of an orthogonal matrix Q are on the unit circle: |λ|=1.

SOLUTION Let 𝐯 be a unit eigenvector corresponding to λ: Q𝐯=λ𝐯 with 𝐯=1. Then

1=𝐯=Q𝐯=λ𝐯=|λ|.

Problem 5.3⋆ (A) Explain why an orthogonal matrix Q must be equal to I if all its eigenvalues are 1.

SOLUTION

Note that Q is normal (QQ=I) and therefore by the spectral theorem for normal matrices we have

Q=Q̃ΛQ̃=Q̃Q̃=I

since Q̃ is unitary.

Problem 5.4 (B) Complete the implementation of a type representing reflections that supports Q[k,j] and such that * takes O(n) operations.

 

 

If your code is correct, these "unit tests" will succeed

 

 

Problem 5.5 (C) Complete the following implementation of a Housholder reflection so that the unit tests pass. Here s == true means the Householder reflection is sent to the positive axis and s == false is the negative axis.

Problem 5.6⋆ (A) Consider a Householder reflection with 𝐱=[1,h] with h=2n. What is the floating point error in computing 𝐲=𝐱𝐞1+𝐱 for each choice of sign.

SOLUTION

Since 𝐱=1+h2, we have 𝐲=[11+h2,h]. We note first that hfp and (h2)fp are exact due to the choice of h, so we only need to discuss the floating error in computing 11+h2.

Numerically, let the length of the significand be S, then

1h2={1+h2nS/21n>S/2=1+h2+δ1

where |δ1|ϵm2.

+ PLUS +

Since 1h2fp>0, we know that 11h2fp=(1+δ2)(1+1h2fp)=(1+δ2)(1+1+h2+δ1(1+δ3)) where |δ2|,|δ3|ϵm2. Then 11h2fp1+1+h2=(1+δ2)(1+1+h2+δ1(1+δ3)1+h21+1+h2)=(1+δ2)(1+(1+δ3)(1+h2+δ11+h2)+δ31+h21+1+h2)(1+δ2)(1+δ12(1+1+h2)1+h2+δ31+h21+1+h2) and we can bound the relative error by |δ2|+|δ1|12(1+1+h2)1+h2+|δ3|1+h21+1+h2|δ2|+|δ1|4+3|δ3|4ϵm.

In conclusion, it's very accurate to compute 1+1+h2. Let us verify this:

MINUS

If n>S/2, then 11h2fp=11fp=11=0 so the relative error is 100%.

If nS/2 but not too small, 1h2 is exactly 1+h2 but 1+h2fp can have rounding error. Expand 1+h2 into Taylor series: 1+h2=1+12h218h4+116h6O(h8)=1+22n124n3+26n4O(28n) so 1+h2fp={1n=S/21+12h2S34n<S/21+12h218h4S46n<S34 where we can conclude that the absolute error is approximately 12h2,18h4,116h6, for each stage when h is small. Keeping in mind that 11+h212h2 when h is small, the relative error is approximately 1,14h2,18h4, for each stage. Special note: the relative error is exactly 1 in the first stage when n=S/2.

If n is so small that 1+h2 is noticably larger than 1, the absolute error can be bounded by ϵm2 so the relative error is bounded by ϵm2(1+h21)ϵmh2.

Let us verify these conclusions:

From this plot we can clearly identify the 3 phases:

  1. When n is very small, the relative error grows smoothly with n;
  2. When n is neither too large nor too small, the relative error has jumps at around S/2=26, (S3)/4=12.25, (S4)/6=8. In each stage, the slope is as predicted. For example, the first stage from S/2 to (S3)/4 has relative error of 14h2, hence the slope is -2 between 13 and 25.
  3. When nS/2, the relative error is 100%;

The transition point between phase 1 and 2 is at the stage for O(h8) of relative error from phase 2 which meets ϵmh2 from phase 1.

MATH50003 Numerical Analysis: Problem Sheet 4

This problem sheet explores least squares, the QR decomposition including for tridiagonal matrices, and the PLU decompositions.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

 

1. Least squares and QR decompositions

Problem 1.1 (B) Find and plot the best least squares fit of 15x2+1 by degree n polynomials for n=0,,10 at 1000 evenly spaced points between 0 and 1.

SOLUTION

END

Problem 1.2⋆ (B) Show that every matrix has a QR decomposition such that the diagonal of R is non-negative. Make sure to include the case of more columns than rows.

SOLUTION

Beginning with the square case, a square matrix A=QR with square orthogonal Q and square upper triangular R can always be rewritten in the form A=QD1DR where D is a diagonal matrix with sign(R[j,j]) on the diagonal. As a result, DR is an upper triangular matrix with positive diagonal. It remains to check that QD1=QD is still orthogonal - this is easy to check since QD(QD)T=QDDQT=QQT=I. Note we have made use of the fact that the inverse of a diagonal matrix is diagonal, that any diagonal matrix satisfies DT=D and that DD=I since sign(R[j,j])2=1.

The same argument works for the non-square cases as long as we take care to consider the appropriate dimensions and pad with the identity matrix. Note that Q is always a square matrix in the QR decomposition. Assume the R factor has more columns m than rows n. Then a square n×n diagonal matrix with its n diagonal entries being sign(R[j,j]),j=1..n works with the same argument as above.

Finally, assume that R has less colums m than rows n. In this case the square n×n diagonal matrix with its first m diagonal entries being sign(R[j,j]),j=1..m and any remaining diagonal entries being 1 works with the same argument as above.

Since the matrix D is always square diagonal and orthogonal by construction, everything else for both of these cases is exactly as in the square case.

Here are some practical examples in Julia:

END

Problem 1.3⋆ (B) Show that the QR decomposition of a square invertible matrix is unique, provided that the diagonal of R is positive.

SOLUTION

Assume there is a second decomposition also with positive diagonal

A=QR=Q~R~

Then we know

QQ~=RR~1

Note QQ~ is orthogonal, and RR~1 has positive eigenvalues (the diagonal), hence all m eigenvalues of QQ~ are 1. This means that QQ~=I and hence Q~=Q, which then implies R~=R.∎ END

2. Gram–Schmidt

Problem 2.1⋆ (B) The modified Gram–Schmidt algorithm is a slight variation of Gram–Schmidt where instead of computing

𝐯j:=𝐚jk=1j1𝐪k𝐚jrkj𝐪k

we compute it step-by-step:

𝐯j1:=𝐚j𝐯jk+1:=𝐯jk𝐪k𝐯jk𝐪k

Show that 𝐯jj=𝐯j.

SOLUTION

Recall that the column vectors of Q are orthonormal i.e. (𝐪i,𝐪j)=δij. Next observe that substituting from 𝐯j1 for 𝐚j, for each 𝐯ji we get i terms (note that the second term in the definition of 𝐯jk+1 contributes only 1 term by orthonormality of Q, retrieving the contribution from 𝐯ji1 plus one.)

𝐯ji:=𝐚jk=1i2qk𝐚jqkqi1𝐚jqi1

END

Problem 2.2 (B) Complete the following function implementing the modified Gram–Schmidt algorithm:

SOLUTION

END

Problem 2.3 (B) Compare the orthogonality of Q between gramschmidt and modifiedgramschmidt when applied to a 300 × 300 random matrix.

SOLUTION

For reference, here is the gramschmidt algorithm from the lecture notes:

Modified Gram–Schmidt is more accurate for constructing an orthogonal matrix:

END

3. Householder reflections

Problem 3.1 (B) Complete the definition of Reflections which supports a sequence of reflections, that is,

Q=Q𝐯1Q𝐯n

where the vectors are stored as a matrix V whose j-th column is 𝐯j, and

Q𝐯j=I2𝐯j𝐯j.

Problem 3.2 (B) Complete the following function that implements Householder QR using only O(mn2) operations.

SOLUTION

Here we sketch some additional notes comparing performance.

END

4. Banded QR with Given's rotations

Problem 4.1⋆ (A) Describe an algorithm for computing the QR decomposition of a tridiagonal matrix using rotations instead of reflections to upper-triangularise column-by-column.

SOLUTION

Let A be a tridiagonal matrix:

A=[a11a1200a21a22a230a32a3300an1,n0an,n1ann]=[a1a2an],

where each ajRn and [aj]k=0 for |jk|>1.

Recall that,

1a2+b2[abba][ab]=[a2+b20],

and that,

1a2+b2[abba]=[cosθsinθsinθcosθ],

is a rotation matrix where θ=arctan(b/a). With this in mind, consider multiplying A from the left by,

Q1=[a11r11a21r11a21r11a11r1111],

where r11=a112+a212. This rotates dimensions 1 and 2 through angle θ=arctan(a21/a11). We have,

Q1a1=[r1100],Q1a2=[r12:=1r11(a11a12+a21a22)t1:=1r11(a11a22a21a12)a3200],Q1a3=[r13:=1r11a21a23s1:=1r11a11a23a33a4300],Q1ak=ak for k>3.

Then we take,

Q2=[1t1r22a32r22a32r22t1r2211],

where r22=t12+a322, a matrix which rotates dimensions 2 and 3 through angle θ2=arctan(a32/t1). Then,

Q2Q1a1=Q1a1=[r1100],Q2Q1a2=Q2[r12t1a3200]=[r12r22000],Q2Q1a3=Q2[r13s1a33a4300]=[r13r23:=1r22(t1s1+a32a33)t2:=1r22(t1a33a32s1)a4300]
Q2Q1a4=Q2[00a34a44a5400]=[0r24:=1r22a32a34s2:=1r22t1a34a44a5400],Q2Q1ak=ak for k>4.

Now, for j=3n1 we take,

Qj:=[Ij1tj1rjjaj+1,jrjjaj+1,jrjjtj1rjjInj1],

where rjj:=tj12+aj+1,j2.

This gives,

QjQ1ak=[0k3rk2,krk1,krk,k0nk],

for kj, and,

QjQ1aj+1=[0rj1,j+1rj,j+1:=1rjj(tj1sj1+aj+1,jaj+1,j+1)tj:=1rjj(tj1aj+1,j+1sj1aj+1,j)aj+2,j+100],QjQ1aj+2=[0rj,j+2:=1rjjaj+1,jaj+1,j+2sj:=1rjjtj1aj+1,j+2aj+2,j+200].

Finally we define, rnn=1rn1,n1(tn2an,nan,n1sn2), to obtain,

Qn1Q1A=[r11r12r13000r22r23r2400rn2,n2rn2,n1rn2,n0rn1,n1rn1,n0rn,n]=:R

so that A=QR, for Q=Q11Qn11, where each matrix Qj rotates the coordinates (j,j+1) through the angle θj=arctan(aj+1,j/tj1), and thus each matrix Qj1 rotates the coordinates (j,j+1) through the angle arctan(aj+1,j/tj1). This will be important for Problem 4.3.

END

Problem 4.2 (B) Implement Rotations which represents an orthogonal matrix Q that is a product of rotations of angle θ[k], each acting on the entries k:k+1. That is, it returns Q=Q1Qk such that

Qk[k:k+1,k:k+1]=[cos(θ[k])sin(θ[k])sin(θ[k])cos(θ[k])]

Problem 4.3 (A) Combine Rotations and UpperTridiagonal from last problem sheet to implement a banded QR decomposition, bandedqr, that only takes O(n) operations. Hint: use atan(y,x) to determine the angle.

Problem 4.4⋆ (B) Could one redesign the above to only use IEEE operatations (addition, multiplication, square-roots, avoiding calls atan, cos, and sin)? Would it have been possible to implement this algorithm using reflections? If so, what would be the structure of a matrix whose columns are the vectors of reflections?

SOLUTION

Yes, one could avoid using atan, cos, and sin. At each stage of the algorithm, we only use atan to compute θ[i] which we then use to compute Qi1 and thus Q at the end. Instead, one could simply compute Qi1 directly from ti1 and ai+1,i and save it at each stage. An efficient implementation of Q11Qn1 would then compute Q at the end. One could implement this using Householder Reflections as well. Interpreting the 'vectors of reflections' to mean the vectors which define each Householder reflection (padded with 0s at the front so that they are the same dimension and fit into a matrix), the matrix with these columns would be a lower bidiagonal matrix with bandwidths (l,u)=(1,0).

END

5. PLU decomposition

Problem 5.1⋆ (C) Compute the PLU decompositions for the following matrices:

[021261114],[1210242135611282]

SOLUTION Part 1 Compute the PLU decomposition of,

[021261114]

We begin by permuting the first and second row as |2| > |0|, hence,

P1=[010100001],P1A=[261021114]

We then choose,

L1=[1000101201],L1P1A=[2610210272]

There is no need to permute at this stage, so P2=I3, the 3dimensional identity. Then we can choose,

L2=[100010011],L2P2L1P1A=[261021009/2]=:U

Since P2=I3, this reduces to L2L1P1A=UA=P11L11L21U. Since P1 simply permutes two rows, it is its own inverse, and L11L21 is simply,

L:=L11L21=[1000101211]

Hence, we have A=PLU, where,

P=P11=[010100001],L=[1000101211],U=[261021009/2]

We can quickly check:

Part 2

Find the PLU decomposition of,

A=[1210242135611282]

We see that we must start with the permutation,

P1:=[0010010010000001],P1A=[3561242112101282]

We then choose L1,

L1:=[1000231001301013001],L1P1A=[35610232530131130113673]

We then choose P2,

P2:=[1000000100100100],P2L1P1A=[35610113673013113023253]

We then choose L2,

L2:=[10000100011110021101],L2P2L1P1A=[35610113673005116110010112311]

We then choose P3 to swap the third and fourth rows,

P3:=[1000010000010010],P3L2P2L1P1A=[35610113673001011231100511611]

We complete the triangularisation by choosing L3,

L3:=[10000100001000121],L3P3L2P2L1P1A=[35610113673001011231100012]

We now consider, L3P3L2P2L1P1=L3L2~L1~P3P2P1, where L1~ and L1~ satisfy,

P3P2L1=L1~P3P2P3L2=L2~P3

These can be computed via,

L1~=P3P2L1P21P31L2~=P3L2P31

to obtain,

L1~=[1000131002301013001]L2~=[10000100021110011101]

This gives us,

L1=L3L2~L1~=[100013100232111013111121],

from which it is clear that,

L=[100013100232111013111121]

Finally, we have that,

P=P11P21P31=[0001001010000100],

so that A=PLU, with,

P=P11P21P31=[0001001010000100],L=[100013100232111013111121],U=[35610113673001011231100012]

To check:

END

MATH50003 Numerical Analysis: Problem Sheet 5

This problem sheet explores positive definite matrices, Cholesky decompositions, matrix norms, and the singular value decomposition.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

1. Positive definite matrices and Cholesky decompositions

Problem 1.1⋆ (C) Use the Cholesky decomposition to determine which of the following matrices are symmetric positive definite:

[1113],[122212221],[321242125],[4221242222421224]

SOLUTION

A matrix is symmetric positive definite (SPD) if and only if it has a Cholesky decomposition, so the task here is really just to compute Cholesky decompositions (by hand). Since our goal is to tell if the Cholesky decompositions exist, we do not have to compute Lk's. We only need to see if the decomposition process can keep to the end.

Matrix 1

A0=[1113]

A1=3(1)×(1)1>0, so Matrix 1 is SPD.

Matrix 2

A0=[122212221]

A1=[1221][22][22]=[3223]

A1[1,1]<0, so Matrix 2 is not SPD.

Matrix 3

A0=[321242125]

A1=[4225]13[21][21]=13[84413]

3A2=134×48>0, so Matrix 3 is SPD.

Matrix 4

A0=[4221242222421224]

A1=[422242224]14[221][221]=14[124641266615]

4A2=[126615]112[46][46]=43[8339] 3A3=93×38>0, so Matrix 4 is SPD.

We can check that we did this correctly by running the following in Julia:

END

Problem 1.2⋆ (B) Recall that an inner product 𝐱,𝐲 on n over the reals satisfies, for all 𝐱,𝐲,𝐳 and a,b:

  1. Symmetry: 𝐱,𝐲=𝐲,𝐱
  2. Linearity: a𝐱+b𝐲,𝐳=a𝐱,𝐳+b𝐲,𝐳
  3. Posive-definite: 𝐱,𝐱>0,x0

Prove that 𝐱,𝐲 is an inner product if and only if

𝐱,𝐲=𝐱K𝐲

where K is a symmetric positive definite matrix.

SOLUTION

We begin by showing that 𝐱,𝐲=𝐱K𝐲 with K spd defines an inner product. To do this we simply verify the three properties: For symmetry, we find 𝐱,𝐲=𝐱K𝐲=𝐱(K𝐲)=(K𝐲)𝐱 =(K𝐲)𝐱=𝐲K𝐱=𝐲K𝐱=𝐲,𝐱. For linearity: a𝐱+b𝐲,𝐳=(a𝐱+b𝐲)K𝐳=(a𝐱+b𝐲)K𝐳 =a𝐱K𝐳+b𝐲K𝐳=a𝐱,𝐳+b𝐲,𝐳. Positive-definiteness of the matrix K immediately yields 𝐱,𝐱=𝐱K𝐱>0. Now we turn to the converse result, i.e. that there exists a symmetric positive definite matrix K for any inner product ⟨𝐱, 𝐲⟩ such that it can be written as 𝐱,𝐲=𝐱K𝐲. Define the entries of K by Kij=ei,ej where ej is the j-th standard basis vector. Note that by linearity of the inner product any inner product on n can be written as 𝐱,𝐲=k=0nl=0nxkylek,el by linearity. But with the elements of K defined as above this is precisely 𝐱,𝐲=k=0nl=0nxkKklyl=𝐱K𝐲. What remains is to show that this K is symmetric positive definite. Symmetry is an immediate conseqence of the symmetry of its elements, i.e. Kij=ei,ej=ej,ei=Kji. Finally, positive definiteness follows from the positive definiteness of the inner product 𝐱,𝐱>0 with 𝐱,𝐱=𝐱K𝐱.

END

Problem 1.3⋆ (A) Show that a matrix is symmetric positive definite if and only if it has a Cholesky decomposition of the form

A=UU

where U is upper triangular with positive entries on the diagonal.

SOLUTION

 

We didn't discuss this but note that because a symmetric positive definite matrix has stricly positive eigenvalues: for a normalised eigenvector we have

λ=λ𝐯𝐯=𝐯K𝐯>0.

Thus they are always invertible. Then note that any such matrix has a Cholesky decomposition of standard form A=LL where L is lower triangular. The inverse of this standard form Cholesky decomposition is then A1=LTL1, which is of the desired form since L is lower triangular and LT is upper triangular. The positive entries on the diagonal follow directly because this is the case for the Cholesky decomposition factors of the original matrix. Thus, since all symmetric positive definite matrices can be written as the inverses of a symmetric positive definite matrix, this shows that they all have a decomposition A=UU (using the Cholesky factors of its inverse).

Alternatively, we can replicate the procedure of computing the Cholesky decomposition beginning in the bottom right instead of the top left. Write:

A=[K𝐯𝐯α]=[I𝐯αα]U1[K𝐯𝐯α1][I𝐯αα]U1

The induction proceeds as in the lower triangular case.

END

Problem 1.4⋆ (A) Prove that the following n×n matrix is symmetric positive definite for any n:

Δn:=[2112112112]

Deduce its two Cholesky decompositions: Δn=LnLn=UnUn where Ln is lower triangular and Un is upper triangular.

SOLUTION

We first prove that Ln is lower bidiagonal by induction. Let A be a tridiagonal SPD matrix:

A=[αβ0β0K] where K is again tridiagonal. Denote the Cholesky decomposition of A by A=LL. Recalling the proof of the Theorem (Cholesky & SPD) from the lecture, we can write

L=[α0βα0L~] where L~ satisfies L~L~=K[β2/α000] which is a tridiagonal matrix smaller than A. Since L is lower bidiagonal when A is 1×1, we know by induction that L is lower bidiagonal for A of any size.

Once we know that Ln is lower bidiagonal, we can write it as Ln=[a1b1bn1an] which we substitute into Δn=LnLn to get {a1=2akbk=1bk2+ak+12=2 for k=1,,n1.

Now we solve the recurrence. Substituting the second equation into the third one: 1ak2+ak+12=2. Let ck=ak2: 1ck+ck+1=2. Consider the fixing point of this recurrence: 1x+x=2(x1)2=0 has the double root x=1 which hints us to consider 1ck1. In fact, the recurrence is equivalent to 1ck+11=1ck1+1. Recalling that c1=2, we know that 1ck1=k. As a result, ck=(k+1)/k, ak=(k+1)/k and bk=k/(k+1), hence we know Ln.

We can apply the same process to Un, but this is a special case since flipping Δn horizontally and vertically gives itself: PΔnP=Δn where

P=[11]

is the permutation that reverses a vector. So we can also flip Ln to get Un:

Un=PLnP

so that UnUn=PLnPPLnP=PΔnP=Δn.

Alternatively one can use the procedure from Problem 1.3. That is, write:

Δn=[Δn1𝐞n𝐞n2]=[I𝐞n22]U1[Δn1𝐞n𝐞n21][I𝐯αα]U1

Continuing proceeds as above.

END

Problem 1.5 (B) SymTridiagonal(dv, eu) is a type for representing symmetric tridiagonal matrices (that is, SymTridiagonal(dv, ev) == Tridiagonal(ev, dv, ev)). Complete the following implementation of cholesky to return a Bidiagonal cholesky factor in O(n) operations, and check your result compared to your solution of Problem 1.3 for n = 1_000_000.

2. Matrix norms

Problem 2.1⋆ (B) Prove the following:

A=maxkA[k,:]1A1=vec(A)=maxkj|akj|

SOLUTION

Step 1. upper bounds

Ax=maxk|jakjxj|maxkj|akjxj|{maxj|xj|maxkj|akj|=xmaxkA[k,:]1maxkj|akj|j|xj|=x1vec(A)

Step 2.1. meeting the upper bound (A1)

Let alm be the entry of A with maximum absolute value. Let x=em, then Ax=maxk|jakjxj|=maxk|akm|=|alm| and x1vec(A)=1|alm|.

Step 2.2. meeting the upper bound (A)

Let A[n,:] be the row of A with maximum 1-norm. Let x=(sign.(A[n,:])), then |jakjxj|{=j|akj|=A[k,:]1k=nj|akj|=A[k,:]1kn, so Ax=maxk|jakjxj|=maxkA[k,:]1 while xmaxkA[k,:]1=1maxkA[k,:]1.

Conclusion

In both cases, equality can hold, so the upper bounds are actually maxima.

END

Problem 2.2⋆ (B) For a rank-1 matrix A=𝐱𝐲 prove that

A2=𝐱2𝐲2.

Hint: use the Cauchy–Schwartz inequality.

SOLUTION

Az2=xyz2=|yz|x2, so it remains to prove that y2=supz|yz|z2.

By Cauchy-Schwartz inequality, |yz|=|(y,z)|y2z2 with the two sides being equal when y and z are linearly dependent, in which case the bound is tight.

END

Problem 2.3⋆ (B) Show for any orthogonal matrix Qm and matrix Am×n that

QAF=AF

by first showing that AF=tr(AA) using the trace of an m×m matrix:

tr(A)=a11+a22++amm.

SOLUTION

tr(AA)=k(AA)[k,k]=kjA[k,j]A[j,k]=kjA[j,k]2=AF2.

On the other hand, tr(AA)=tr(AQQA)=tr((QA)(QA))=QAF2, so QAF=AF.

END

3. Singular value decomposition

Problem 3.1⋆ (B) Show that A2AFrA2 where r is the rank of A.

SOLUTION

From Problem 2.3 use the fact that AF=tr(AA), where ARm×n.

Hence,

AF2=tr(AA)=σ12+...+σm2

where σ1...σn0 are the singular values of A and σi2 are the eigenvalues of AA

Knowing that A22=σ12 we have A22AF2

Moreover, since if the rank of A is r we have that σr+1=...=σm=0 and we also know σ1...σn0, we have that

AF2=σ12+...+σm2=σ12+...+σr2rσ12=rA22

Hence,

A2AFrA2.

END

Problem 3.2 (A) Consider functions sampled on a (n+1)×(n+1) 2D grid (xk,yj)=(k/n,j/n) where k,j=0,,n. For n=100, what is the lowest rank r such that the best rank-r approximation to the samples that is accurate to within 105 accuracy for the following functions:

(x+2y)2,cos(sinxey),1/(x+y+1),sign(xy)

For which examples does the answer change when n=1000?

SOLUTION

END

Problem 3.3⋆ (B) For Am×n define the pseudo-inverse:

A+:=VΣ1U.

Show that it satisfies the Moore-Penrose conditions:

  1. AA+A=A
  2. A+AA+=A+
  3. (AA+)=AA+ and (A+A)=A+A

SOLUTION

Let A=UΣV and A+:=VΣ1U, where Um×r and Vn×r. Note that UU=Im and VV=Ir.

  1. We have
AA+A=UΣVVΣ1UUΣV=UΣΣ1ΣV=UΣV=A
  1. Moreover,
A+AA+=VΣ1UUΣVVΣ1U=VΣ1ΣΣ1U=VΣ1U=A+
(AA+)=(A+)A=UΣ1VVΣU=UU=UΣVVΣ1U=AA+(A+A)=A(A+)=VΣUUΣ1V=VV=VΣ1UUΣV=A+A

END

Problem 3.4⋆ (A) Show for Am×n with mn and ⊤ rank that 𝐱=A+𝐛 is the least squares solution, i.e., minimises A𝐱𝐛2. Hint: extend U in the SVD to be a square orthogonal matrix.

SOLUTION

The proof mimics that of the QR decomposition. Write A=UΣV and let

Ũ=[UK]

so that Ũ is orthogonal. We use the fact orthogonal matrices do not change norms:

A𝐱𝐛22=UΣV𝐱𝐛22=ŨUΣV𝐱Ũ𝐛22=[ImO]m×nΣV𝐱[UK]𝐛22=ΣV𝐱U𝐛22+K𝐛2

The second term is independent of 𝐱. The first term is minimised when zero:

ΣV𝐱U𝐛2=ΣVVΣ1U𝐛U𝐛2=0

END

Problem 3.5⋆ (A) If Am×n has a non-empty kernel there are multiple solutions to the least squares problem as we can add any element of the kernel. Show that 𝐱=A+𝐛 gives the least squares solution such that 𝐱2 is minimised.

SOLUTION

Let 𝐱=A+b and let 𝐱+𝐤 to be another solution i.e.

A𝐱b=A(𝐱+𝐤)b

Following the previous part we deduce:

ΣV(𝐱+𝐤)=U𝐛V𝐤=0

As 𝐱=V𝐜 lies in the span of the columns of V we have 𝐱𝐤=0. Thus

𝐱+𝐤2=𝐱2+𝐤2

which is minimised when 𝐤=0.

END

MATH50003 Numerical Analysis: Problem Sheet 6

This problem sheet explores condition numbers, indefinite integration, and Euler's method.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

1. Condition numbers

Problem 1.1⋆ (B) Prove that, if |ϵi|ϵ and nϵ<1, then

k=1n(1+ϵi)=1+θn

for some constant θn satisfying |θn|nϵ1nϵ.

SOLUTION

k=1n(1+ϵi)(1+ϵ)n=k=0n(nk)ϵk1+k=1nnkϵk1+k=1nkϵk=1+nϵ1nϵ. k=1n(1+ϵi)(1ϵ)n=k=0n(nk)(ϵ)k1k=1nnkϵk1k=1nkϵk=1nϵ1nϵ.

Problem 1.2⋆ (B) Let A,Bm×n. Prove that if the columns satisfy 𝐚j2𝐛j2 then AFBF and A2rank(B)B2.

SOLUTION

Recalling from Problem Sheet 5 - Problem 2.3* - SOLUTION, we know that AF=k,jA[k,j]2=jaj22andBF=jbj22. Since 𝐚j2𝐛j2, we have AFBF.

Recalling from Problem Sheet 5 - Problem 3.1*, we have A2AFBFrank(B)B2.

Problem 1.3⋆ (C) Compute the 1-norm, 2-norm, and ∞-norm condition numbers for the following matrices:

[1234],[1/31/501/7],[11/21/2n]

(Hint: recall that the singular values of a matrix A are the square roots of the eigenvalues of the Gram matrix AA.)

SOLUTION

A=[1234],A1=12[4231]
B=[1/31/501/7],B1=21[1/71/501/3]

A1=6, A11=7/2, so κ1(A)=21.

A=7, A1=3, so κ(A)=21.

B1=12/35, B11=21×8/15=56/5, so κ1(B)=96/25.

B=8/15, B1=21×12/35, so κ(B)=96/25

Finally, for the 2-norms: κ2(A): For A=[1234], we have that the singular values are the σ1=λ1,σ2=λ2, where λ1 and λ2 are the eigenvalues of ATA.

ATA=[10141420],

so an eigenvalue λ of ATA must satisfy,

(10λ)(20λ)196=0λ=15±221.

The larger eigenvalue corresponds to σ1, so σ1=15+221, and the smaller corresponds to σ2, so σ2=15221. Finally, we have A2=σ1,A12=1/σ2, and so κ2(A)=15+22115221.

κ2(B): For

B=[1/31/501/7],

we have that the singular values are the σ1=λ1,σ2=λ2, where λ1 and λ2 are the eigenvalues of ATA.

ATA=[1/91/151/15745272].

An eigenvalue λ must satisfy: \begin{align} (1/9 - \lambda)\left(\frac{74}{5272}-\lambda\right) - \frac{1}{225} = 0 \ \Leftrightarrow \lambda = \frac{1891 \pm29\sqrt{2941}}{22050}. \end{align} With the same logic as above, we can then deduce that B2=1891+29294122050 and B12220501891292941 so that,

κ2(B)=1891+2929411891292941

For,

An=[11/21/2n],An1=[122n]

It is clear that An1=An=1, and An11=An1=2n, so κ1(An)=κ(A)=2n. Morover, we can clearly see the singular values σ1=1,σ2=1/2,,σn+1=1/2n. So An2=1,An12=2n, κ2(An)=2n

Problem 1.4 (B) State a bound on the relative error on A𝐯 for 𝐯2=1 for the following matrices:

[1/31/501/103],[11/21/210]

Compute the relative error in computing A𝐯 (using big for a high-precision version to compare against) where 𝐯 is the last column of V in the SVD A=UΣV, computed using the svd command with Float64 inputs. How does the error compare to the predicted error bound?

SOLUTION

The Theorem (relative-error for matrix vector) tells us that,

δAxAxκ(A)ϵ,

if the relative pertubation error δA=Aϵ. For the 2-norm, we have,

δA2min(m,n)nϵm2nϵmϵA2.

The condition number of the first matrix is 453.33 (see code below to compute that), and ϵ defined above is 22ϵm22ϵm=3.141016, so the bound on the relative error is: 1.421013. The condition number of the second matrix is 210 by the question above, and ϵ defined above is 1010ϵm210ϵm=7.021016, the bound on the relative error in this case is then:

7.191013

Note, this is exact so the relative error is 0, within the upper bound.

2. Indefinite integration

Problem 2.1 (B) Implement backward differences to approximate indefinite-integration. How does the error compare to forward and mid-point versions for f(x)=cosx on the interval [0,1]? Use the method to approximate the integrals of

exp(expxcosx+sinx),k=11000(xk1), and f1000s(x)

to 3 digits, where f1000s(x) was defined in PS2.

SOLUTION

We can implement the backward difference solution as follows:

Comparing each method's errors, we see that the backward method has the same error as the forward method:

Part two:

Problem 2.2 (A) Implement indefinite-integration where we take the average of the two grid points:

u(xk+1)+u(xk)2uk+1ukh

What is the observed rate-of-convergence using the ∞-norm for f(x)=cosx on the interval [0,1]? Does the method converge if the error is measured in the 1-norm?

SOLUTION

The implementation is as follows:

Comparing the error to the midpoint method, we see that the errors are very similar:

Now looking at the L1 norm, we see it is converging, but to a smaller error before it starts to increase:

3. Euler methods

Problem 3.1 (B) Solve the following ODEs using forward and/or backward Euler and increasing n, the number of time-steps, until u(1) is determined to 3 digits:

u(0)=1,u(t)=cos(t)u(t)+tv(0)=1,v(0)=0,v(t)=cos(t)v(t)+tw(0)=1,w(0)=0,w(t)=tw(t)+2w(t)2

If we increase the initial condition w(0)=c>1, w(0) the solution may blow up in finite time. Find the smallest positive integer c such that the numerical approximation suggests the equation does not blow up.

SOLUTION

We see that u(1)=2.96 to three digits.

We see that v(1) is 1.66 to three digits. Finally,

For c=1, we see that w(1)=2.80 to 3 digits. Now consider for c>1:

It appears that c=2 is the smallest positive integer greater than 1 for which the numerical approximation suggests the equation does not blow up.

Problem 3.2⋆ (B) For an evenly spaced grid t1,,tn, use the approximation

u(tk+1)+u(tk)2uk+1ukh

to recast

u(0)=cu(t)=a(t)u(t)+f(t)

as a lower bidiagonal linear system. Use forward-substitution to extend this to vector linear problems:

𝐮(0)=𝐜𝐮(t)=A(t)𝐮(t)+𝐟(t)

SOLUTION

We have, \begin{align} \frac{u{k+1} - u_k}{h} \approx \frac{u'(t{k+1}) + u'(t_k)}{2} = \frac{a(t{k+1})u{k+1} + a(t{k})u{k}}{2} + \frac{1}{2}(f(t_{k+1}) + f(t_k)), \end{align} so we can write, \begin{align} \left(\frac{1}{h} - \frac{a(t{k+1})}{2}\right)u{k+1} + \left(-\frac{1}{h} - \frac{a(t{k})}{2}\right)u_k = \frac{1}{2}(f(t{k+1}) + f(t_k)). \end{align} With the initial condition u(0)=c, we can write the whole system as,

[11ha(t1)21ha(t2)21ha(tn1)21ha(tn)2]u=[c12(f(t1)+f(t2))12(f(tn1)+f(tn))],

which is lower bidiagonal.

Now if we wish to use forward substitution in a vector linear problem, we can derive in much the same way as above:

(1hIA(tk+1)2)uk+1+(1hIA(tk)2)uk=12(f(tk+1)+f(tk)),

to make the update equation,

uk+1=(Ih2A(tk+1))1((I+h2A(tk))uk+h2(f(tk+1)+f(tk))),

with initial value,

u1=c.

Problem 3.3 (B) Implement the method designed in Problem 3.1 for the negative time Airy equation

u(0)=1,u(0)=0,u(t)=tu(t)

on [0,50]. How many time-steps are needed to get convergence to 1% accuracy (the "eyeball norm")?

SOLUTION We will work with, u(t)=[u(t)u(t)], so that our differential equation is:

u(t)=[u(t)u(t)]=[01t0]u(t),

so that,

A(t)=[01t0],

and with initial conditions,

u(0)=[10].

We will use the method described in Problem 3.1, with f(t)=0:

u1=[10],uk+1=(Ih2A(tk+1))1(I+h2A(tk))uk.

To see when the error goes below 1%, consider the below:

Problem 3.4 (A) Implement Heat on a graph with m=50 nodes with no forcing and initial condition um/2(0)=1 and uk(0)=0, but where the first and last node are fixed to zero, that is replace the first and last rows of the differential equation with the conditions:

u1(t)=um(t)=0.

Find the time t such that 𝐮(t)<103 to 2 digits. Hint: Differentiate to recast the conditions as a differential equation. Vary n, the number of time-steps used in Forward Euler, and increase T in the interval [0,T] until the answer doesn't change. Do a for loop over the time-slices to find the first that satisfies the condition. (You may wish to call println to print the answer and break to exit the for loop).

SOLUTION

Following the hint, we will begin by writing a function called heat_dissipation(T), which runs a simulation of the heat equation with the specified conditions up to time T. If the condition u(t)<103 is met at a time t<T, then it will return t, else it will return T. We choose the value n=1000 not too large so that we can run this on a large range of values for T. Also note that we use Backward Euler, which is more stable for smaller values of n; T can potentially be quite large, so Forward Euler may be unstable for even moderately large values of n.

We run this on a large range of values for T. The function returns approximately constant (905) values when T>905, so this suggests that our answer lies somewhere around 905.

Plotting, we can clearly see that the time output by the function becomes the same towards the end of our range, so we will restrict our search to the end of this range.

Zooming in:

This looks promising, but it seems like the time-output is somewhat unstable even after T is large enough. Inspecting the actual values of the output, we see that this is likely due to the step size we are using - it will be different for different values of T (as h=Tn), and so the smallest t in the discretise range may jump up and down if n is not large enough. To be sure of the answer to 2 decimal places, we will need n to be larger than 2T0.01180000. We will redefine our function with n=200000, and run it on a few different values of T (that are definitely larger than our target time) to be sure we get the same answer to 2 decimal places.

We can see that each time we get 902.38 to 2 decimal places, so this is our answer.

Problem 3.5 (B) Consider the equation

u(0)=1,u(t)=10u(t)

What behaviour do you observe on [0,10] of forward, backward, and that of Problem 3.1 with a step-size of 0.5? What happens if you decrease the step-size to 0.01? (Hint: you may wish to do a plot and scale the y-axis logarithmically,)

SOLUTION

We observe that for the stepsize h=0.5, the forward method blows up while the other methods appear to converge.

For a smaller stepsize (h=0.01), the forward method is also able to converge.

MATH50003 Numerical Analysis: Problem Sheet 7

This problem sheet explores condition numbers, indefinite integration, and Euler's method.

Questions marked with a ⋆ are meant to be completed without using a computer. Problems are denoted A/B/C to indicate their difficulty.

 

1. Two-point boundary value problems

Problem 1.1 (C) Construct a finite-difference approximation to the forced Helmholtz equation

u(0)=0u(1)=0u+k2u=ex

and find an n such the error is less than 104 when compared with the true solution for k=10:

u(x)=(cos(kx)+excos(kx)2+cot(k)sin(kx)ecos(k)cot(k)sin(kx)esin(k)sin(kx)+exsin(kx)2)/(1+k2)

Problem 1.2 (B) Discretisations can also be used to solve eigenvalue equations. Consider the Schrödinger equation with quadratic oscillator:

u(L)=u(L)=0,u+x2u=λu

Approximate the eigenvalues using eigvals(A) (which returns the eigenvalues of a matrix A) with L=10. Can you conjecture their exact value if L=? (Hint: they are integers and the eigenvalues closest to zero are most accurate.)

SOLUTION

On inspection of the smallest values, it seems that the positive odd integers are the eigenvalues for L=. Increasing L (and also n) it becomes more obvious:

Problem 1.3⋆ (A) Consider Helmholtz with Neumann conditions:

u(0)=c0u(1)=c1uxx+k2u=f(x)

Write down the finite difference approximation approximating u(xk)uk on an evenly spaced grid xk=(k1)/(n1) for k=1,,n using the first order derivative approximation conditions:

u(0)(u2u1)/h=c0u(1)(unun1)/h=c1

Use pivoting to reduce the equation to one involving a symmetric tridiagonal matrix.

SOLUTION

We have, with u(xk)=uk (and using κ instead of k in the equation uxx+k2u=f(x) so as to avoid confusion with the indices):

u2u1h=c0,uk12uk+uk+1h2+κ2uk=f(xk), for k=2:n1unun1h=c1,

which we write in matrix form as:

[1h1h1h2κ22h21h21h2κ22h21h21h1h]u=[c0f(x2)f(xn1)c1],

which we can make symmetric tridiagonal by multiplying the first row by 1/h and the final row by 1/h:

[1h21h21h2κ22h21h21h2κ22h21h21h21h2]u=[c0hf(x2)f(xn1)c1h],

2. Convergence

Problem 2.1⋆ (B) For the equation

u(0)=c0u+au=f(x)

where a and 0x1, prove convergence as n for the method constructed in PS6 using the approximation where we take the average of the two grid points:

u(xk+1)+u(xk)2uk+1ukh.

SOLUTION Using the approximation from PS6 we obtain

(f(xk+1)+f(xk))2=(u(xk+1)+u(xk))2+a(u(xk+1)+u(xk))2(uk+1uk)h+auk+12+auk2

So we get

(a21h)uk+(a2+1h)uk+1=f(xk+1)+f(xk)2

We want to prove that supk=1,...,n1|u(xk)uk| converges to 0 as n.

Take u^=[u0,...,un1]T and rewrite the system as

L^u^=[c0f^]

where fk=f(xk)+f(xk1)2, k=1,...,n1 and

L^=[1a21ha2+1ha21ha2+1ha21ha2+1h]

Note that L^ is lower bidiagonal.

Now, similarly to Euler's methods convergence theorem, we study consistency and stability.

Consistency:

Our discretisation approximates the true equation.

L^u=[c0u(x1)u(x0)h+a2(u(x1)+u(x0))u(xn1)u(xn2)h+a2(u(xn1)+u(xn2))]=[c012(u(x1)u(x0)h+u(x1)u(x0)h+a(u(x1)+u(x0)))12(u(xn1)u(xn2)h+u(x1)u(x0)h+a(u(xn1)+u(xn2)))]=[c012(u(x0)+au(x0)+u(τ0)h+u(x1)+au(x1)+u(σ1)h)12(u(xn2)+au(xn2)+u(τn2)h+u(xn1)+au(xn1)+u(σn1)h)]=[c0f(x0)+f(x1)2+u(τ0)+u(σ1)2hf(xn2)+f(xn1)2+u(τn2)+u(σn1)2h]=[c0f^]+[0δ]

where xkτk,σkxk+1, and uniform boundedness implies that δ=O(h)

Stability:

The inverse does not blow up the error.

L^=[1(a2+1h)1(a2+1h)1]D[1(a2+1h)(a21h)1(a2+1h)(a21h)1]L

Thus, we have L11|a241h2|(n1)|a24n2|(n1)|4a2+4n2|n1=O(1)

Note that |a2+1h|1=|hah2+1|2h, as h0 (taking h small enough)

Combining stability and consistency we have 𝐮𝐮=L^1(L^𝐮L^𝐮)=L1D1[0δ]2hL11δ1=O(h)

Problem 2.2⋆ (A) Consider the matrices

L=[1a11a21an11],T=[1a1a1a1].

By writing down the inverse explicitly prove that if |ak|a then

L11T11.

Use this to prove convergence as n of forward Euler for

u(0)=c0u(x)a(x)u(x)=f(x)

SOLUTION

Since

L1=[10000...0a11000...0a1a2a2100...0a1a2a3a2a3a310...010i=1n1aii=2n1ai......an2an1an11]

and

T1=[10000...0a1000...0a2a100...0a3a2a10...010an1an2......a2a1]

Then, x

T1x=maxi|(T1x)i|=maxi|xi+j=1i1aijxj|={1if a[0,1]an1if a1

since, given b=max{1,a},

maxi|xi+j=1i1aijxj|maxi(|xi|+j=1i1|aijxj|)bnj=1n|xj|=bnx1

thus,

T11=supx0T1xx1bn and, in particular, T11=bn

since |T1x|x1=bn it is obtained using x={e1b=1enb=a

Moreover, |aj|b, j=1,...,n, thus,

L1x=maxi|(L1x)i|=maxi|xi+j=1i1aj...ai1xj|maxi|xi|+j=1i1|aj...ai1xj|bnx1

Hence,

L11=supxL1xx1bn=T11

Now we prove convergence for the forward Euler method as n for

u(0)=c0u(x)=a(x)u(x)+f(x)

Using equidistanced (with step h) points x0,...,xn1, we use the approximations u(xk)uk, where u0=c0 and uk+1=uk+h(a(xk)uk+f(xk))

In order to study convergence we consider the limit as n of supi=1,...,n1|uiu(xi)|

Similarly to Euler's methods convergence theorem, we study consistency and stability.

In order to apply the theorem we note that we can define ak=a(xk), k=1,...n1 and we have that for every k, |ak|a:=maxi=1,n1|ai|.

Consistency:

Our discretisation approximates the true equation.

L^u=[c0u(x1)u(x0)ha1u(x0)u(xn1)u(xn2)han1u(xn2)]=[c0u(x0)a1u(x0)+u(τ0)hu(xn2)an1u(xn2)+u(τn2)h]=[c0f(x0)+u(τ0)hf(xn2)+u(τn2)h]=[c0𝐟]+[0δ]

where xkτkxk+1, and uniform boundedness implies that δ=O(h)

Stability:

The inverse does not blow up the error. First write, for lk=1ak

L^=[1h1h1]D[1l11ln11]L

Thus, we have L11T11=O(1)

Combining stability and consistency we have 𝐮𝐮=L^1(L^𝐮L^𝐮)=L1D1[0δ]hL11δ1=O(h)

 

 

3. Fourier series

Problem 3.1⋆ (C) Give explicit formulae for f̂k and f̂kn for the following functions:

cosθ,cos4θ,sin4θ,33eiθ,112eiθ

Hint: You may wish to try the change of variables z=eiθ.

SOLUTION The explicit formulae for f̂ can be found by direct computation. Making use of trigonometric identities and noting that cos x and sin x are even and odd respectively on a periodic interval, hence orthogonal

  1. f̂=12π02πcosθexpikθdθ=12π02π(cosθcoskθ)dθ

    For k=1, yields

    f̂=12π02πcos2θdθ=12

    and k1 using standard trigonometric identity

    f̂=14π02πcosθ(1k)+cosθ(1+k)dθ=0
  2. similar to 1), replace cases k=,1 by k=,4

  3. Either use Euler's formula to expand sin4θ into exponentials or use the identity sin4θ=34cos2θ+cos4θ8

    Noting that sin4θ is an even function yields f̂=12π02πsin4θexpikθdθ=12π02π(sin4θcoskθ)dθ Using the above identity we need to take care of the case k2,4, which is zero as we have seen in 1). For special cases k=2,4 we have f̂2=14 and f̂4=116.

  1. We will have to separate the cases k<0 and k0.

(k0) Make the substition z=3eiθ and integrating over the contour of the circle of radius 3 f̂=i2π13k+1|z|=3zk1zdz then by the residue theorem yields f̂=13k.

(k<0) Make the substition z=eiθ and observe that the pole lies outside of the contour of radius 1, hence f̂=0 for all k<0.

  1. We will have to separate the cases k<0 and k0.

(k0) Make the substition z=eiθ2 and observe that the pole lies outside of the contour of radius 12 , hence f̂=0 for all k0.

(k<0) Make the substition z=2eiθ and integrating over the contour of the circle of radius 2 f̂=i2π12k+1|z|=2zk1z1dz then by the residue theorem yields f̂=12|k|.

For the student unfamiliar with complex analysis:
4*) Substitute for z=eiθ3 such that f(θ)=11z then write the function as a geometric series Σjzj plug in

f̂=12π3j02πΣjeiθ(jk)dθ

and finally use the Lemma "Discrete orthogonality" from the lecture notes to show the result above.

5*) Similar idea to 4) however notice that this time we only have non-zero contributions for k<0.

Problem 3.2⋆ (B) Prove that if the first λ1 derivatives f(θ),f(θ),,f(λ1)(θ) are 2π-periodic and f(λ) is uniformly bounded that

|f̂k|=O(|k|λ)as |k|

Use this to show for the Taylor case (0=f̂1=f̂2=) that

|f(θ)k=0n1f̂keikθ|=O(n1λ)

SOLUTION A straightforward application of integration by parts yields the result

f̂=12π02πf(θ)eikθdθ=(i)λ2πkλ02πf(λ)(θ)eikθdθ given that f(λ) is uniformly bounded, the second part follows directly from this result

|k=nf̂keikθ|k=n|f̂k|Ck=nkλ

for some constant C.

Problem 3.3⋆ (C) If f is a trigonometric polynomial (f̂k=0 for |k|>m) show for n2m+1 we can exactly recover f:

f(θ)=k=mmf̂kneikθ

SOLUTION This proof is nearly identical to the proof of "Theorem (Taylor series converges)" in the lecture notes. Only now one has to also subtract the negative coefficients from the negative approximate coefficients in the chain of arguments.

Problem 3.4⋆ (B) For the general (non-Taylor) case and n=2m+1, prove convergence for

fm:m(θ):=k=mmf̂kneikθ

to f(θ) as n. What is the rate of convergence if the first λ1 derivatives f(θ),f(θ),,f(λ1)(θ) are 2π-periodic and f(λ) is uniformly bounded?

SOLUTION

Observe that by aliasing (see corollary in lecture notes) and triangle inequality we have the following

|f̂knf̂k|p=1(|f̂k+pm|+|f̂kpm|)

Using the result from Problem 3.2 yields

|f̂knf̂k|Cnλp=11(p+kn)λ+1(pkn)λ

now we pick |q|<12 (such that the estimate below will hold for both summands above) and construct an integral with convex and monotonocally decreasing integrand such that

(p+q)λ<p12p+12(x+q)λdx

more over summing over the left-hand side from 1 to yields a bound by the integral:

12(x+q)λdx=1λ(12+q)λ+1

Finally let q=±kn to achieve the rate of convergence

|f̂knf̂k|Cλnλ((12+k/n)λ+1+((12k/n))λ+1)

where Cλ is a constant depending on λ. Note that it is indeed important to split the n coefficients equally over the negative and positive coefficients as stated in the notes, due to the estatime we used above.